Generalized linear models offer a framework for applying something like linear regression in all sorts of different contexts.
All GLMs will have the same basic setup: we’re predicting y as a function of some predictors and coefficients that are transformed by a link function:
We’ll talk about a couple other common models that use this same framework.
Count models can be used to model rates or outcomes that range from 0 to \(\infty\). Some examples might include:
Number of battle deaths in a conflict
Number of terror attacks in a country
Number of protests in a year
Number of years between coups (although this may be classified as a survival model)
The simplest way to model a count is with a Poisson distribution: \(y = \text{Poisson}(e^{X^t\beta})\)
The expected value of \(Y|X\) is
\[ E(Y|X) = e ^{(b_0 + b_1X_1 +b_2X_2...)} \]
poisson_func <- function(x,beta0,beta1) {
theta <- exp(beta0 + beta1*x)
}
curve(poisson_func(x,0,1),xlim=c(-1,5), lwd=2,ylab="poisson_func function", las=1)
curve(poisson_func(x,-2,1),xlim=c(-1,5), lwd=2, lty=2, col="red", add=TRUE)
curve(poisson_func(x,2,1),xlim=c(-1,5), lwd=2, lty=2, col="blue", add=TRUE)
legend(-1,150,c(expression(paste(beta[0]," = 0")),
expression(paste(beta[0]," = -2")),
expression(paste(beta[0]," = 2"))),
lwd=2, lty=1:3, col=c("black","red","blue"))poisson_func <- function(x,beta0,beta1) {
theta <- exp(beta0 + beta1*x)
}
curve(poisson_func(x,3,-2),xlim=c(-1,5), lwd=2,ylab="poisson_func function", las=1)
curve(poisson_func(x,3,0),xlim=c(-1,5), lwd=2, lty=2, col="blue", add=TRUE)
curve(poisson_func(x,3,2),xlim=c(-1,5), lwd=2, lty=2, col="red", add=TRUE)
legend(3,150,c(expression(paste(beta[1]," = 0")),
expression(paste(beta[1]," = -2")),
expression(paste(beta[1]," = 2"))),
lwd=2, lty=1:3, col=c("blue","black", "red"))As with logit and probit models, the link function means that our coefficient values are somewhat unintuitive. Technically, the represent the effect of a one unit change in X on the log count of the dependent variable.
model<-glm(nkill ~e_gdppc ,
data=gtd_counts, family='poisson')
modelsummary(list("Number killed" = model),
coef_rename=T,
estimate = "{estimate}",
statistic = c("conf.int"),
conf_level = .95,
note = "Note: 95% CI in brackets",
gof_omit = 'F|RMSE|R2$',
title ='Number killed in terrorist attacks in 2018'
)| Number killed | |
|---|---|
| Note: 95% CI in brackets | |
| (Intercept) | 0.548 |
| [0.400, 0.692] | |
| GDP per capita ($100,000) | 0.602 |
| [0.375, 0.821] | |
| Num.Obs. | 176 |
| AIC | 1867.1 |
| BIC | 1873.5 |
| Log.Lik. | -931.564 |
Exponentiating the coefficients turns this into a multiplicative effect. So the coefficients represent the change in the rate of the dependent variable. i.e. a coefficient of 0.5 means that a one-unit increase in X halves the number of attacks, while a coefficient of 2 would mean doubling the number.
model<-glm(nkill ~ e_gdppc,
data=gtd_counts, family='poisson')
modelsummary(list("Number killed" = model),
coef_rename=T,
estimate = "{estimate}",
statistic = c("conf.int"),
conf_level = .95,
note = "Note: 95% CI in brackets",
gof_omit = 'F|RMSE|R2$',
title ='Number killed in terrorist attacks in 2018',
exponentiate =TRUE
)| Number killed | |
|---|---|
| Note: 95% CI in brackets | |
| (Intercept) | 1.730 |
| [1.492, 1.997] | |
| GDP per capita ($100,000) | 1.826 |
| [1.455, 2.274] | |
| Num.Obs. | 176 |
| AIC | 1867.1 |
| BIC | 1873.5 |
| Log.Lik. | -931.564 |
For predicted counts, we can simply take the linear prediction from the model and then exponentiate it:
x<-cbind(1, e_gdppc = seq(0, 2, by=.1))
predictions<- exp(x%*% cbind(coef(model)))
data.frame(x, predictions)|>
ggplot(aes(x=e_gdppc*100000, y=predictions)) +
geom_point() +
geom_line()+
theme_bw() +
labs(x ='GDP Per Capita ($USD)', y='Predicted deaths') +
scale_x_continuous(labels = scales::label_currency()) The poisson distribution assumes that the variance is equal to the mean (note that the distribution only has a single parameter: \(\lambda\)). This is rarely true for real-world count data:
[1] 2.238636
[1] 63.0513
data.frame(dist = dist_poisson(lambda =mean(gtd_counts$nkill)))|>
ggplot(aes(xdist = dist))+
stat_slab(alpha=.7) +
stat_slab(data=gtd_counts, aes(x=nkill),
fill='lightblue',
inherit.aes = FALSE, alpha=.5) +
theme_bw() +
labs(title = 'Terrorist attacks in 2018 compared to a poisson distribution with the same mean') Models with more variance than expected are called “overdispersed”, and this causes us to underestimate our standard errors.
This is may be the result of different data generating processes, i.e.: one pattern for states with terrorist insurgencies, a separate pattern for those without.
Quasipoisson models optimize a different likelihood function that can account for overdispersion.
A random effects model with one random effect per observation
Hurdle or zero-inflated models (in special cases)
A negative binomial model
tibble('a' = dist_negative_binomial(size=1, .5),
'b' = dist_negative_binomial(size=10, .5),
'c' = dist_negative_binomial(size=100, .5)
)|>
ggplot()+
stat_slab(aes(xdist = a), fill='lightblue', alpha=.7) +
stat_slab(aes(xdist = b), fill='lightgreen', alpha=.7) +
stat_slab(aes(xdist = c), fill='orange', alpha=.7) +
theme_bw() | poisson | negative binomial | |
|---|---|---|
| Note: 95% CI in brackets | ||
| (Intercept) | 0.548 | 0.508 |
| [0.400, 0.692] | [-0.293, 1.411] | |
| GDP per capita ($100,000) | 0.602 | 0.697 |
| [0.375, 0.821] | [-0.729, 2.647] | |
| Num.Obs. | 176 | 176 |
| AIC | 1867.1 | 424.6 |
| BIC | 1873.5 | 434.1 |
| Log.Lik. | -931.564 | -209.297 |
The inverse link for the negative binomial and poisson models are the same, so the exponentiated coefficients have the same interpretation as they did for the poisson model:
model_list<-list('poisson' = model,
"negative binomial" = nb_model)
modelsummary(model_list,
coef_rename = TRUE,
estimate = "{estimate}",
statistic = c("conf.int"),
conf_level = .95,
note = "Note: 95% CI in brackets",
gof_omit = 'F|RMSE|R2$',
title ='Number killed in terrorist attacks in 2018. Exponentiated coefficients.',
exponentiate =TRUE
)| poisson | negative binomial | |
|---|---|---|
| Note: 95% CI in brackets | ||
| (Intercept) | 1.730 | 1.662 |
| [1.492, 1.997] | [0.746, 4.099] | |
| GDP per capita ($100,000) | 1.826 | 2.008 |
| [1.455, 2.274] | [0.482, 14.113] | |
| Num.Obs. | 176 | 176 |
| AIC | 1867.1 | 424.6 |
| BIC | 1873.5 | 434.1 |
| Log.Lik. | -931.564 | -209.297 |
The performance package has a function to check for overdispersion using simulated residuals from a count model. The null hypothesis here is no overdispersion:
# Overdispersion test
dispersion ratio = 27.681
Pearson's Chi-Squared = 4816.528
p-value = < 0.001
We can also see this is addressed by the negative binomial model:
Since the only difference here is the addition of a single dispersion parameter, we can treat these two models as nested and use a likelihood ratio test to determine with the negative binomial model is a better fit. Unsurprisingly, it is:
mfx<-avg_comparisons(nb_model)
modelsummary(mfx,
coef_rename = TRUE,
estimate = "{estimate}",
statistic = c("conf.int"),
conf_level = .95,
note = "Note: 95% CI in brackets",
gof_omit = 'F|RMSE|R2$',
title ='Number killed per attack. Exponentiated coefficients.',
)| (1) | |
|---|---|
| Note: 95% CI in brackets | |
| e_gdppc | 2.270 |
| [-4.931, 9.472] | |
| Num.Obs. | 176 |
| AIC | 424.6 |
| BIC | 434.1 |
| Log.Lik. | -209.297 |
A random effects model also provides and alternative method to account for overdispersion by adding a single random effect term for every case in the model:
Poisson and negative binomial models can also be used to model ratio. For instance, we might want to model the number of deaths per attack rather than just the number of deaths in general. To do this, we’ll need to include the logged number of attacks as an “offset” variable in the model. We’ll also have to remove cases with zero attacks, since it doesn’t make sense to measure a rate where the denominator is zero:
deaths_per_attack<-gtd_counts|>
filter(n > 0)|>
glm.nb(nkill ~ e_gdppc + offset(log(n)),
data=_)
modelsummary(deaths_per_attack,
coef_rename = TRUE,
estimate = "{estimate}",
statistic = c("conf.int"),
conf_level = .95,
note = "Note: 95% CI in brackets",
gof_omit = 'F|RMSE|R2$',
title ='Number killed per attack. Exponentiated coefficients.',
exponentiate =TRUE
)| (1) | |
|---|---|
| Note: 95% CI in brackets | |
| (Intercept) | 0.554 |
| [0.251, 1.404] | |
| GDP per capita ($100,000) | 0.673 |
| [0.146, 4.591] | |
| Num.Obs. | 106 |
| AIC | 389.3 |
| BIC | 397.3 |
| Log.Lik. | -191.671 |
Ordered logit/probit models can be used to model the effect of a one unit increase in some ordered categorical outcome, such as an agree/disagree scale.
One way to think about these models is that they work like the latent data-formulation of the logit model, but there are multiple thresholds that may be unequally spaced:
Our goal here would be to estimate both the effect of X on the latent outcome Y* and the thresholds for each of the response categories.
In a sense, you could think of this as a series of K-1 logistic regressions for each level of the ordered category:
\[ \begin{align*} Pr(Y > 1) & = \text{logit}^-1(X\beta) \\ Pr(Y > 2) & = \text{logit}^-1(X\beta- \text{cut}_2) \\ Pr(Y > 3) & = \text{logit}^-1(X\beta -\text{cut}_3) \\ ...\\ Pr(Y > K-1) & = \text{logit}^-1(X\beta - \text{cut}_{K-1}) \\ \end{align*} \]
set.seed(1000)
N<-1000
x<-rnorm(N)
ystar <- x * 3
cutpoints<-c(-5, 0, 9)
ycut<-cut(ystar + rlogis(N), c(-Inf,-9, -5, 0, 6, Inf),
labels=c("lowest","lower", "medium", "higher", "highest")
)
df<-data.frame(x, ystar, ycut)
MASS::polr(ycut ~ x, data=df, method='logistic')Call:
MASS::polr(formula = ycut ~ x, data = df, method = "logistic")
Coefficients:
x
2.961754
Intercepts:
lowest|lower lower|medium medium|higher higher|highest
-8.9015292 -4.7899480 0.1010896 5.7852156
Residual Deviance: 1255.273
AIC: 1265.273
The terms labelled as intercepts here correspond to the cutpoints in my simulated latent variable, and the coefficient on X is just the effect of x on ystar
To get predictions for individual models, we’ll need to use the inverse of the logit function and calculate
\[ Pr(y = k) = \text{logit}^{-1}(X\beta-c_{K-1}) - \text{logit}^{-1}(X\beta-c_k) \]
We’ll get the coefficient values and the cutpoints from the model:
Now we’ll calculate individual response probabilities:
lowest|lower
3.643911e-07
lower|medium
2.187871e-05
medium|higher
0.002929454
higher|highest
0.46253
higher|highest
0.5345183
Of course, we can also just use predict to get these same results:
Or we can use the ggeffects package to get predictions across all values of x and all levels of the outcome:
Here’s a more practical example using European Social Survey data for Great Britain. The dependent variable is whether richer countries have an obligation to take immigrants, and the predictors are whether the respondent has been unemployed in the last 5 years, and their self placement on a left/right scale:
Call:
MASS::polr(formula = richer_responsible ~ uemp5yr + lrscale,
data = gb, method = "logistic")
Coefficients:
uemp5yrYes lrscale
0.0007202287 -0.1513971718
Intercepts:
Disagree strongly|Disagree Disagree|Neither agree nor disagree
-4.077814 -1.316149
Neither agree nor disagree|Agree Agree|Agree strongly
-0.553276 2.047397
Residual Deviance: 1059.459
AIC: 1071.459
And we can get estimated average marginal effects from the model as well. Note that we have 5 different margins for each coefficient, one for each category of the DV:
library(marginaleffects)
avg_comparisons(ologit)|>
dplyr::select(term, group, contrast, estimate, conf.low, conf.high)|>
mutate(across(where(is.numeric), .fns=~round(.x, digits=4)))
Term Group Contrast Estimate 2.5 % 97.5 %
lrscale Disagree strongly +1 0.0055 0.0002 0.0107
lrscale Disagree +1 0.0296 0.0075 0.0516
lrscale Neither agree nor disagree +1 0.0017 -0.0008 0.0041
lrscale Agree +1 -0.0287 -0.0500 -0.0075
lrscale Agree strongly +1 -0.0080 -0.0144 -0.0015
uemp5yr Disagree strongly Yes - No 0.0000 -0.0122 0.0122
uemp5yr Disagree Yes - No -0.0001 -0.0704 0.0701
uemp5yr Neither agree nor disagree Yes - No 0.0000 -0.0064 0.0063
uemp5yr Agree Yes - No 0.0001 -0.0682 0.0685
uemp5yr Agree strongly Yes - No 0.0000 -0.0204 0.0205
Ordered logits make the most sense when you have a relatively small number of ordered categories and you think the differences between the responses are “unevenly spaced” with respect to your IVs
If you have a lot of response categories (7 or more) a linear model if generally a reasonable alternative.
Even in cases with a smaller number of categories, the juice may not be worth the squeeze, but an ordered logit might be worth exploring if you’re unsatisfied with your results.